Web Scraping with Python

This Python script performs web scraping on a website to extract links, emails, and WhatsApp links from the specified domain (stei.itb.ac.id). It uses the requests library to fetch web pages and BeautifulSoup for parsing HTML content.

Usage

Ensure you have the required libraries installed:
```
pip install requests beautifulsoup4
```
Modify the script to specify the target domain (DOMAIN), home URL (HOME_URL), and other settings as needed.
Run the script:
```
python script.py
```
The script will perform the following actions:
- Visit the home URL (HOME_URL) and extract all links from the specified domain (DOMAIN).
- Collect email addresses (mailto: links) and WhatsApp links (api.whatsapp.com).
- Save the extracted data to separate log files (scrape-links-stei.log, scrape-email-stei.log, scrape-whatsapp-stei.log).
The script will recursively follow links within the specified domain to gather additional URLs.
The extracted links, emails, and WhatsApp links will be saved in their respective log files.

Customization

You can modify the HOME_URL, DOMAIN, TIMEOUT, or other settings in the script to target different websites or adjust the scraping behavior.
To specify a different starting URL, change the value of HOME_URL in the script.

Output

Extracted links from the specified domain are saved in scrape-links-stei.log.
Extracted email addresses are saved in scrape-email-stei.log.
Extracted WhatsApp links are saved in scrape-whatsapp-stei.log.

License

This script is provided under the MIT License.


Please adapt the script and README.md to your specific use case or requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

main.py

main.py

Repository files navigation

Web Scraping with Python

Usage

Customization

Output

License

About

Releases

Packages

Languages

dms-codes/scrape-stei-itb-ac-id

Folders and files

Latest commit

History

README.md

README.md

main.py

main.py

Repository files navigation

Web Scraping with Python

Usage

Customization

Output

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages